XPath Example
   HOME

TheInfoList



OR:

XPath (XML Path Language) is an expression language designed to support the query or transformation of
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
documents. It was defined by the
World Wide Web Consortium The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
(W3C) and can be used to compute values (e.g.,
strings String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
, numbers, or Boolean values) from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.


Overview

The XPath language is based on a tree representation of the XML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria. In popular use (though not in the official specification), an XPath expression is often referred to simply as "an XPath". Originally motivated by a desire to provide a common syntax and behavior model between
XPointer XPointer is a system for addressing components of XML-based Internet media. It is divided among four specifications: a " framework" that forms the basis for identifying XML fragments, a positional element addressing scheme, a scheme for namespaces ...
and
XSLT XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subseque ...
, subsets of the XPath
query language Query languages, data query languages or database query languages (DQL) are computer languages used to make queries in databases and information systems. A well known example is the Structured Query Language (SQL). Types Broadly, query language ...
are used in other
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
specifications such as
XML Schema An XML schema is a description of a type of Extensible Markup Language, XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed ...
,
XForms XForms is an XML format used for collecting inputs from web forms. XForms was designed to be the next generation of HTML / XHTML forms, but is generic enough that it can also be used in a standalone manner or with presentation languages other tha ...
and the Internationalization Tag Set (ITS). XPath has been adopted by a number of XML processing libraries and tools, many of which also offer CSS Selectors, another W3C standard, as a simpler alternative to XPath.


Versions

There are several versions of XPath in use. XPath 1.0 was published in 1999, XPath 2.0 in 2007 (with a second edition in 2010), XPath 3.0 in 2014, and XPath 3.1 in 2017. However, XPath 1.0 is still the version that is most widely available. *XPath 1.0 became a Recommendation on 16 November 1999 and is widely implemented and used, either on its own (called via an API from languages such as
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
, C#,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
or
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
), or embedded in languages such as
XSLT XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subseque ...
,
XProc XProc is a W3C Recommendation to define an XML transformation language to define XML Pipelines. Below is an example abbreviated XProc file: This is a pipeline that consists of two atomic steps, XInclude and Val ...
,
XML Schema An XML schema is a description of a type of Extensible Markup Language, XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed ...
or XForms. * XPath 2.0 became a Recommendation on 23 January 2007, with a second edition published on 14 December 2010. A number of implementations exist but are not as widely used as XPath 1.0. The XPath 2.0 language specification is much larger than XPath 1.0 and changes some of the fundamental concepts of the language such as the type system. *:The most notable change is that XPath 2.0 is built around the
XQuery and XPath Data Model The XQuery and XPath Data Model (XDM) is the data model shared by the XPath 2.0, XSLT 2.0, XQuery, and XForms programming languages. It is defined in a W3C recommendation.Anders Berglund, ''et al.''XQuery 1.0 and XPath 2.0 Data Model W3C, 2010, r ...
(XDM) that has a much richer type system. Every value is now a sequence (a single atomic value or node is regarded as a sequence of length one). XPath 1.0 node-sets are replaced by node sequences, which may be in any order. *:To support richer type sets, XPath 2.0 offers a greatly expanded set of functions and operators. *:XPath 2.0 is in fact a subset of
XQuery XQuery (XML Query) is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats (JSON, bin ...
1.0. They share the same data model ( XDM). It offers a for expression that is a cut-down version of the "
FLWOR The programming language XQuery defines FLWOR (pronounced 'flower') as an expression that supports iteration and binding of variables to intermediate results. FLWOR is an acronym: FOR, LET, WHERE, ORDER BY, RETURN. FLWOR is loosely analogous to ...
" expressions in XQuery. It is possible to describe the language by listing the parts of XQuery that it leaves out: the main examples are the query prolog, element and attribute constructors, the remainder of the "FLWOR" syntax, and the typeswitch expression. *
XPath 3.0 XPath 3 is the latest version of the XML Path Language, a query language for selecting nodes in XML documents. It supersedes XPath 1.0 and XPath 2.0. XPath 3.0 became a W3C Recommendation on 8 April 2014, while XPath 3.1 became a W3C Recommendatio ...
became a Recommendation on 8 April 2014. The most significant new feature is support for functions as first-class values. XPath 3.0 is a subset of XQuery 3.0, and most current implementations (April 2014) exist as part of an XQuery 3.0 engine. *XPath 3.1 became a Recommendation on 21 March 2017. This version adds new data types: maps and arrays, largely to underpin support for
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
.


Syntax and semantics (XPath 1.0)

The most important kind of expression in XPath is a ''location path''. A location path consists of a sequence of ''location steps''. Each location step has three components: * an ''
axis An axis (plural ''axes'') is an imaginary line around which an object rotates or is symmetrical. Axis may also refer to: Mathematics * Axis of rotation: see rotation around a fixed axis * Axis (mathematics), a designator for a Cartesian-coordinat ...
'' * a '' node test'' * zero or more ''
predicates Predicate or predication may refer to: * Predicate (grammar), in linguistics * Predication (philosophy) * several closely related uses in mathematics and formal logic: **Predicate (mathematical logic) **Propositional function **Finitary relation, ...
''. An XPath expression is evaluated with respect to a ''context node''. An Axis Specifier such as 'child' or 'descendant' specifies the direction to navigate from the context node. The node test and the predicate are used to filter the nodes specified by the axis specifier: For example, the node test 'A' requires that all nodes navigated to must have label 'A'. A predicate can be used to specify that the selected nodes have certain properties, which are specified by XPath expressions themselves. The XPath syntax comes in two flavors: the ''abbreviated syntax'', is more compact and allows XPaths to be written and read easily using intuitive and, in many cases, familiar characters and constructs. The ''full syntax'' is more verbose, but allows for more options to be specified, and is more descriptive if read carefully.


Abbreviated syntax

The compact notation allows many defaults and abbreviations for common cases. Given source XML containing at least the simplest XPath takes a form such as * /A/B/C that selects C elements that are children of B elements that are children of the A element that forms the outermost element of the XML document. The XPath syntax is designed to mimic URI (
Uniform Resource Identifier A Uniform Resource Identifier (URI) is a unique sequence of characters that identifies a logical or physical resource used by web technologies. URIs may be used to identify anything, including real-world objects, such as people and places, conc ...
) and Unix-style file path syntax. More complex expressions can be constructed by specifying an axis other than the default 'child' axis, a node test other than a simple name, or predicates, which can be written in square brackets after any step. For example, the expression * A//B/* /code> selects the first child ('* /code>'), whatever its name, of every B element that itself is a child or other, deeper descendant ('//') of an A element that is a child of the current context node (the expression does not begin with a '/'). Note that the predicate /code> binds more tightly than the / operator. To select the first node selected by the expression A//B/*, write (A//B/*) /code>. Note also, index values in XPath predicates (technically, 'proximity positions' of XPath node sets) start from 1, not 0 as common in languages like C and Java.


Expanded syntax

In the full, unabbreviated syntax, the two examples above would be written * * Here, in each step of the XPath, the axis (e.g. child or descendant-or-self) is explicitly specified, followed by :: and then the node test, such as A or node() in the examples above. Here the same, but shorter:


Axis specifiers

Axis specifiers indicate navigation direction within the tree representation of the XML document. The axes available are: As an example of using the attribute axis in abbreviated syntax, //a/@href selects the attribute called href in a elements anywhere in the document tree. The expression . (an abbreviation for self::node()) is most commonly used within a predicate to refer to the currently selected node. For example, h3 ='See also'/code> selects an element called h3 in the current context, whose text content is See also.


Node tests

Node tests may consist of specific node names or more general expressions. In the case of an XML document in which the namespace prefix gs has been defined, //gs:enquiry will find all the enquiry elements in that namespace, and //gs:* will find all elements, regardless of local name, in that namespace. Other node test formats are: ; :finds an XML comment node, e.g. ; :finds a node of type text excluding any children, e.g. the hello in hello world ; :finds XML
processing instruction A Processing Instruction (PI) is an SGML and XML node type, which may occur anywhere in the document, intended to carry instructions to the application. Processing instructions are exposed in the Document Object Model as Node.PROCESSING_INSTRUCTION ...
s such as . In this case, processing-instruction('php') would match. ; :finds any node at all.


Predicates

Predicates, written as expressions in square brackets, can be used to
filter Filter, filtering or filters may refer to: Science and technology Computing * Filter (higher-order function), in functional programming * Filter (software), a computer program to process a data stream * Filter (video), a software component tha ...
a node-set according to some condition. For example, a returns a node-set (all the a elements which are children of the context node), and keeps only those elements having an href attribute with the value help.php. There is no limit to the number of predicates in a step, and they need not be confined to the last step in an XPath. They can also be nested to any depth. Paths specified in predicates begin at the context of the current step (i.e. that of the immediately preceding node test) and do not alter that context. All predicates must be satisfied for a match to occur. When the value of the predicate is numeric, it is syntactic-sugar for comparing against the node's position in the node-set (as given by the function position()). So p /code> is shorthand for and selects the first p element child, while p ast()/code> is shorthand for and selects the last p child of the context node. In other cases, the value of the predicate is automatically converted to a boolean. When the predicate evaluates to a node-set, the result is true when the node-set is . Thus p x/code> selects those p elements that have an attribute named x. A more complex example: the expression selects the value of the target attribute of the first a element among the children of the context node that has its href attribute set to help.php, provided the document's html top-level element also has a lang attribute set to en. The reference to an attribute of the top-level element in the first predicate affects neither the context of other predicates nor that of the location step itself. Predicate order is significant if predicates test the position of a node. Each predicate takes a node-set returns a (potentially) smaller node-set. So will find a match only if the first a child of the context node satisfies the condition @href='help.php', while will find the first a child that satisfies this condition.


Functions and operators

XPath 1.0 defines four data types: node-sets (sets of nodes with no intrinsic order), strings, numbers and booleans. The available operators are: * The , and operators, used in path expressions, as described above. * A union operator, , which forms the union of two node-sets. * Boolean operators and , and a function * Arithmetic operators , , , (divide), and * Comparison operators , , , , , The function library includes: * Functions to manipulate strings: * Functions to manipulate numbers: * Functions to get properties of nodes: * Functions to get information about the processing context: * Type conversion functions: Some of the more commonly useful functions are detailed below.


Node set functions

; :returns a number representing the position of this node in the sequence of nodes currently being processed (for example, the nodes selected by an xsl:for-each instruction in XSLT). ; :returns the number of nodes in the node-set supplied as its argument.


String functions

; :converts any of the four XPath data types into a string according to built-in rules. If the value of the argument is a node-set, the function returns the string-value of the first node in document order, ignoring any further nodes. ; : concatenates two or more strings ; : returns true if s1 starts with s2 ; :returns true if s1 contains s2 ; :example: substring("ABCDEF",2,3) returns . ; :example: substring-before("1999/04/01","/") returns 1999 ; :example: substring-after("1999/04/01","/") returns 04/01 ; :returns number of characters in string ; :all leading and trailing whitespace is removed and any sequences of whitespace characters are replaced by a single space. This is very useful when the original XML may have been
prettyprint Pretty-printing (or prettyprinting) is the application of any of various stylistic formatting conventions to text files, such as source code, markup, and similar kinds of content. These formatting conventions may entail adhering to an indentatio ...
formatted, which could make further string processing unreliable.


Boolean functions

; :negates any boolean expression. ; :evaluates to ''true''. ; :evaluates to ''false''.


Number functions

; :converts the string values of all the nodes found by the XPath argument into numbers, according to the built-in casting rules, then returns the sum of these numbers.


Usage examples

Expressions can be created inside predicates using the operators: =, !=, <=, <, >= and >. Boolean expressions may be combined with brackets () and the boolean operators and and or as well as the not() function described above. Numeric calculations can use *, +, -, div and mod. Strings can consist of any
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
characters. selects items whose price attribute is greater than twice the numeric value of their discount attribute. Entire node-sets can be combined ( 'unioned') using the vertical bar character , . Node sets that meet one or more of several conditions can be found by combining the conditions inside a predicate with 'or'. v or y, w /code> will return a single node-set consisting of all the v elements that have x or y child-elements, as well as all the w elements that have z child-elements, that were found in the current context.


Syntax and semantics (XPath 2.0)


Syntax and semantics (XPath 3)


Examples

Given a sample XML document en.wikipedia.org de.wikipedia.org fr.wikipedia.org pl.wikipedia.org es.wikipedia.org en.wiktionary.org fr.wiktionary.org vi.wiktionary.org tr.wiktionary.org es.wiktionary.org The XPath expression /Wikimedia/projects/project/@name selects name attributes for all projects, and /Wikimedia//editions selects all editions of all projects, and selects addresses of all English Wikimedia projects (text of all edition elements where language attribute is equal to ''English''). And the following selects addresses of all Wikipedias (text of all edition elements that exist under project element with a name attribute of ''Wikipedia'').


Implementations


Command-line tools

*
XMLStarlet XMLStarlet is a set of command line utilities (toolkit) to query, transform, validate, and edit XML documents and files using a simple set of shell commands in a way similar to how it is done with UNIX grep, sed, awk, diff, patch, join, etc comman ...
easy to use tool to test/execute XPath commands on the fly. * xmllint (libxml2) * RaptorXML Server from Altova supports XPath 1.0, 2.0, and 3.0
Xidel


C/C++

*
libxml2 libxml2 is a software library for parsing XML documents. It is also the basis for the libxslt library which processes XSLT-1.0 stylesheets. Description Written in the C programming language, libxml2 provides bindings to C++, Ch, XSH, C#, Py ...

Pathan

pugixml
* Sedna XML Database * VTD-XML *
Xalan Xalan is a popular open source software library from the Apache Software Foundation, that implements the XSLT 1.0 XML transformation language and the XPath 1.0 language. The Xalan XSLT processor is available for both the Java and C++ programming ...

XQilla


Free Pascal

* The unit XPath is included in the default libraries


Implementations for database engines

* OpenLink Virtuoso


Java

*
Saxon XSLT Saxon is an XSLT and XQuery processor created by Michael Kay and now developed and maintained by his company, Saxonica. There are open-source and also closed-source commercial versions. Versions exist for Java, JavaScript and .NET. The current v ...
supports XPath 1.0, XPath 2.0 and XPath 3.0 (as well as XSLT 2.0, XQuery 3.0, and XPath 3.0) *
BaseX BaseX is a native and light-weight XML database management system and XQuery processor, developed as a community project on GitHub. It is specialized in storing, querying, and visualizing large XML documents and collections. BaseX is platform-in ...
(also supports XPath 2.0 and XQuery) * VTD-XML * Sedna XML Database Both XML:DB and proprietary.
QuiXPath
a
streaming Streaming media is multimedia that is delivered and consumed in a continuous manner from a source, with little or no intermediate storage in network elements. ''Streaming'' refers to the delivery method of content, rather than the content it ...
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
implementation by Innovimax *
Xalan Xalan is a popular open source software library from the Apache Software Foundation, that implements the XSLT 1.0 XML transformation language and the XPath 1.0 language. The Xalan XSLT processor is available for both the Java and C++ programming ...
*
Dom4j dom4j is an open-source Java library for working with XML, XPath and XSLT. It is compatible with DOM, SAX and JAXP In computing, the Java API for XML Processing, or JAXP ( ), one of the Java XML Application programming interfaces, provides ...
The
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
package has been part of Java standard edition since Java 5 via the
Java API for XML Processing In computing, the Java API for XML Processing, or JAXP ( ), one of the Java XML Application programming interfaces, provides the capability of validating and parsing XML documents. It has three basic parsing interfaces: * the Document Object M ...
. Technically this is an XPath
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standa ...
rather than an XPath implementation, and it allows the programmer the ability to select a specific implementation that conforms to the interface.


JavaScript


jQuery XPath plugin
based o
Open-source XPath 2.0 implementation in JavaScript

FontoXPath
Open source XPath 3.1 implementation in JavaScript. Currently under development.


.NET Framework

* In the System.Xml and System.Xml.XPath namespaces * Sedna XML Database


Perl


XML::LibXML
(libxml2)


PHP

* Sedna XML Database
DOMXPath
via libxml extension


Python

* Th

in the Python Standard Library include

for XPath expressions * libxml2 * Amara * Sedna XML Database
lxml
*
Scrapy Scrapy ( ) is a free and open-source Web crawler, web-crawling Web framework, framework written in Python and developed in Cambuslang. Originally designed for web scraping, it can also be used to extract data using Application programming interf ...


Ruby

*
libxml2 libxml2 is a software library for parsing XML documents. It is also the basis for the libxslt library which processes XSLT-1.0 stylesheets. Description Written in the C programming language, libxml2 provides bindings to C++, Ch, XSH, C#, Py ...
* Nokogiri


Scheme

* Sedna XML Database


SQL

*
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database o ...
supports a subset of XPath from version 5.1.5 onwards *
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
supports XPath and XSLT from version 8.4 onwards


Tcl

* The package provides a complete, compliant, and fast XPath implementation in C


Use in schema languages

XPath is increasingly used to express constraints in schema languages for XML. * The (now
ISO standard The International Organization for Standardization (ISO ) is an international standard development organization composed of representatives from the national standards organizations of member countries. Membership requirements are given in Art ...
) schema language
Schematron Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath. In many implementations ...
pioneered the approach. * A streaming subset of XPath is used in W3C XML Schema 1.0 for expressing uniqueness and key constraints. In XSD 1.1, the use of XPath is extended to support conditional type assignment based on attribute values, and to allow arbitrary boolean assertions to be evaluated against the content of elements. * XForms uses XPath to bind types to values. *The approach has even found use in non-XML applications, such as the source code analyzer for Java called PMD: the Java is converted to a
DOM Dom or DOM may refer to: People and fictional characters * Dom (given name), including fictional characters * Dom (surname) * Dom La Nena (born 1989), stage name of Brazilian-born cellist, singer and songwriter Dominique Pinto * Dom people, an et ...
-like parse tree, then XPath rules are defined over the tree.


See also

*
XPath 3 XPath 3 is the latest version of the XML Path Language, a query language for selecting nodes in XML documents. It supersedes XPath 1.0 and XPath 2.0. XPath 3.0 became a W3C Recommendation on 8 April 2014, while XPath 3.1 became a W3C Recommendat ...
*
Navigational database A navigational database is a type of database in which records or objects are found primarily by following references from other objects. The term was popularized by the title of Charles Bachman's 1973 Turing Award paper, ''The Programmer as Naviga ...
*
XLink XML Linking Language, or XLink, is an XML markup language and W3C specification that provides methods for creating internal and external links within XML documents, and associating metadata with those links. The XLink specification XLink 1.1 is a ...
*
XML database An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document- ...
*
XSL In computing, the term Extensible Stylesheet Language (XSL) is used to refer to a family of languages used to transform and render XML documents. Historically, the W3C XSL Working Group produced a draft specification under the name "XSL," which ...
*
XSL-FO XSL-FO (XSL Formatting Objects) is a markup language for XML document formatting that is most often used to generate PDF files. XSL-FO is part of XSL (Extensible Stylesheet Language), a set of W3C technologies designed for the transformation and ...


Notes


References


External links


XPath 1.0 specification

XPath 2.0 specification

XPath 3.0 specification

XPath 3.1 specification



XPath Reference (MSDN)





XPath - MDC Docs
b
Mozilla Developer Network

XPath introduction/tutorial

XSLT and XPath function reference
{{DEFAULTSORT:Xpath Query languages XML data access